---
sidebar_position: 0
---

# Retrievers

A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store.
A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used
as the backbone of a retriever, but there are other types of retrievers as well.

Retrievers accept a string query as input and return a list of `Document`'s as output.

## Advanced Retrieval Types

LangChain provides several advanced retrieval types. A full list is below, along with the following information:

**Name**: Name of the retrieval algorithm.

**Index Type**: Which index type (if any) this relies on.

**Uses an LLM**: Whether this retrieval method uses an LLM.

**When to Use**: Our commentary on when you should considering using this retrieval method.

**Description**: Description of what this retrieval algorithm is doing.

| Name                                                                                            | Index Type                   | Uses an LLM               | When to Use                                                                                                                             | Description                                                                                                                                                                                                                                                          |
| ----------------------------------------------------------------------------------------------- | ---------------------------- | ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [Vectorstore](/docs/modules/data_connection/retrievers/vectorstore)                             | Vectorstore                  | No                        | If you are just getting started and looking for something quick and easy.                                                               | This is the simplest method and the one that is easiest to get started with. It involves creating embeddings for each piece of text.                                                                                                                                 |
| [ParentDocument](/docs/modules/data_connection/retrievers/parent-document-retriever)            | Vectorstore + Document Store | No                        | If your pages have lots of smaller pieces of distinct information that are best indexed by themselves, but best retrieved all together. | This involves indexing multiple chunks for each document. Then you find the chunks that are most similar in embedding space, but you retrieve the whole parent document and return that (rather than individual chunks).                                             |
| [Multi Vector](/docs/modules/data_connection/retrievers/multi-vector-retriever)                 | Vectorstore + Document Store | Sometimes during indexing | If you are able to extract information from documents that you think is more relevant to index than the text itself.                    | This involves creating multiple vectors for each document. Each vector could be created in a myriad of ways - examples include summaries of the text and hypothetical questions.                                                                                     |
| [Self Query](/docs/modules/data_connection/retrievers/self_query/)                              | Vectorstore                  | Yes                       | If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text.    | This uses an LLM to transform user input into two things: (1) a string to look up semantically, (2) a metadata filer to go along with it. This is useful because oftentimes questions are about the METADATA of documents (not the content itself).                  |
| [Contextual Compression](/docs/modules/data_connection/retrievers/contextual_compression)       | Any                          | Sometimes                 | If you are finding that your retrieved documents contain too much irrelevant information and are distracting the LLM.                   | This puts a post-processing step on top of another retriever and extracts only the most relevant information from retrieved documents. This can be done with embeddings or an LLM.                                                                                   |
| [Time-Weighted Vectorstore](/docs/modules/data_connection/retrievers/time_weighted_vectorstore) | Vectorstore                  | No                        | If you have timestamps associated with your documents, and you want to retrieve the most recent ones                                    | This fetches documents based on a combination of semantic similarity (as in normal vector retrieval) and recency (looking at timestamps of indexed documents)                                                                                                        |
| [Multi-Query Retriever](/docs/modules/data_connection/retrievers/multi-query-retriever)         | Any                          | Yes                       | If users are asking questions that are complex and require multiple pieces of distinct information to respond                           | This uses an LLM to generate multiple queries from the original one. This is useful when the original query needs pieces of information about multiple topics to be properly answered. By generating multiple queries, we can then fetch documents for each of them. |

## [Third Party Integrations](/docs/integrations/retrievers/)

LangChain also integrates with many third-party retrieval services. For a full list of these, check out [this list](/docs/integrations/retrievers/) of all integrations.

## Get started

The public API of the `BaseRetriever` class in LangChain.js is as follows:

```typescript
export abstract class BaseRetriever {
  abstract getRelevantDocuments(query: string): Promise<Document[]>;
}
```

It's that simple! You can call `getRelevantDocuments` to retrieve documents relevant to a query, where "relevance" is defined by
the specific retriever object you are calling.

Of course, we also help construct what we think useful Retrievers are. The main type of Retriever in LangChain is a vector store retriever. We will focus on that here.

**Note:** Before reading, it's important to understand [what a vector store is](/docs/modules/data_connection/vectorstores).

This example showcases question answering over documents.
We have chosen this as the example for getting started because it nicely combines a lot of different elements (Text splitters, embeddings, vectorstores) and then also shows how to use them in a chain.

Question answering over documents consists of four steps:

1. Create an index
2. Create a Retriever from that index
3. Create a question answering chain
4. Ask questions!

Each of the steps has multiple sub steps and potential configurations, but we'll go through one common flow using HNSWLib, a local vector store.
This assumes you're using Node, but you can swap in another integration if necessary.

First, install the required dependency:

import CodeBlock from "@theme/CodeBlock";

import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx";

<IntegrationInstallTooltip></IntegrationInstallTooltip>

```bash npm2yarn
npm install @langchain/openai hnswlib-node @langchain/community
```

You can download the `state_of_the_union.txt` file [here](https://github.com/langchain-ai/langchain/blob/master/docs/docs/modules/state_of_the_union.txt).

import RetrievalQAExample from "@examples/chains/retrieval_qa.ts";

<CodeBlock language="typescript">{RetrievalQAExample}</CodeBlock>

Let's walk through what's happening here.

1. We first load a long text and split it into smaller documents using a text splitter.
   We then load those documents (which also embeds the documents using the passed `OpenAIEmbeddings` instance) into HNSWLib, our vector store, creating our index.

2. Though we can query the vector store directly, we convert the vector store into a retriever to return retrieved documents in the right format for the question answering chain.

3. We initialize a retrieval chain, which we'll call later in step 4.

4. We ask questions!

See the individual sections for deeper dives on specific retrievers, or this section to learn how to
[create your own custom retriever over any data source](/docs/modules/data_connection/retrievers/custom).
